On Power-Law Distributed Balls in Bins and Its Applications to View Size Estimation
نویسندگان
چکیده
The view size estimation plays an important role in query optimization. It has been observed that many data follow a power law distribution. In this paper, we consider the balls in bins problem where we place balls into N bins when the bin selection probabilities follow a power law distribution. As a generalization to the coupon collector’s problem, we address the problem of determining the expected number of balls that need to be thrown in order to have at least one ball in each of the N bins. We prove that Θ( α lnN cα N ) balls are needed to achieve this where α is the parameter of the power law distribution and cN = α−1 α−Nα−1 for α 6= 1 and cN = 1 lnN for α = 1. Next, when fixing the number of balls that are thrown to T , we provide closed form upper and lower bounds on the expected number of bins that have at least one occupant. For n large and α > 1, we prove that our bounds are tight up to a constant factor of ( α α−1 )1− 1 α ≤ e ≃ 1.4.
منابع مشابه
The onset of dominance in balls-in-bins processes with feedback
Consider a balls-in-bins process in which each new ball goes into a given bin with probability proportional to f(n), where n is the number of balls currently in the bin and f is a fixed positive function. It is known that these so-called balls-in-bins processes with feedback have a monopolistic regime: if f(x) = x for p > 1, then there is a finite time after which one of the bins will receive a...
متن کاملOptimizing the Characteristics of the Motion of Steel Balls and their Impact on Shell Liners in SAG Mills
The equations governing the motion of steel balls and their impact onto shell liners in industrial Semi-Autogenous Grinding (SAG) mills are derived in full details by the authors and are used in order to determine the effective design variables for optimizing the working conditions of the mill and to avoid severe impacts which lead to the breakage of SAG mill shell liners. These design vari...
متن کاملMultiple-Choice Balanced Allocation in (Almost) Parallel
We consider the problem of resource allocation in a parallel environment where new incoming resources are arriving online in groups or batches. We study this scenario in an abstract framework of allocating balls into bins. We revisit the allocation algorithm GREEDY[2] due to Azar, Broder, Karlin, and Upfal (SIAM J. Comput. 1999), in which, for sequentially arriving balls, each ball chooses two ...
متن کاملExpected number of uniformly distributed balls in a most loaded bin using placement with simple linear functions
We estimate the size of a most loaded bin in the setting when the balls are placed into the bins using a random linear function in a finite field. The balls are chosen from a transformed interval. We show that in this setting the expected load of the most loaded bins is constant. This is an interesting fact because using fully random hash functions with the same class of input sets leads to an ...
متن کاملExtraction Kinetics and Physicochemical Studies of Terminalia catappa L Kernel Oil Utilization Potential
Kinetics and selected variables (temperature, particle size and time) for extraction of Terminalia Catappa L Kernel Oil (TCKO) were investigated using solvent extraction. Kinetic models studied were: parabolic diffusion, power law, hyperbolic, Elovich and pseudo-second-order. In ascending order, the best-fitted models at the optimum temperature and oil yield were Elovich’s model, hyperbolic...
متن کامل